Introduction

Descriptive Analysis

The raw dataset contains 7,728,394 observations (rows) of 46 variables (columns).

After data preparation and cleaning, the dataset contains 7,546,771 observations (rows) of 53 variables (columns).

Severity Number of Accidents
least severe 66121
less severe 6010987
more severe 1272321
most severe 197342

The author defines severity as “the impact on traffic.” Low severity accidents would have a minimal effect on traffic whereas high severity accidents would have a significant impact on traffic.

We can observe that the majority of accidents that took place between 2016 and 2023 were categorized as “less severe,” accounting for 6,010,987 of the total 7,546,771 accidents.

Statistical Analysis